No Longer Confidential: Estimating the Confidence of Individual Regression Predictions
نویسندگان
چکیده
Quantitative predictions in computational life sciences are often based on regression models. The advent of machine learning has led to highly accurate regression models that have gained widespread acceptance. While there are statistical methods available to estimate the global performance of regression models on a test or training dataset, it is often not clear how well this performance transfers to other datasets or how reliable an individual prediction is-a fact that often reduces a user's trust into a computational method. In analogy to the concept of an experimental error, we sketch how estimators for individual prediction errors can be used to provide confidence intervals for individual predictions. Two novel statistical methods, named CONFINE and CONFIVE, can estimate the reliability of an individual prediction based on the local properties of nearby training data. The methods can be applied equally to linear and non-linear regression methods with very little computational overhead. We compare our confidence estimators with other existing confidence and applicability domain estimators on two biologically relevant problems (MHC-peptide binding prediction and quantitative structure-activity relationship (QSAR)). Our results suggest that the proposed confidence estimators perform comparable to or better than previously proposed estimation methods. Given a sufficient amount of training data, the estimators exhibit error estimates of high quality. In addition, we observed that the quality of estimated confidence intervals is predictable. We discuss how confidence estimation is influenced by noise, the number of features, and the dataset size. Estimating the confidence in individual prediction in terms of error intervals represents an important step from plain, non-informative predictions towards transparent and interpretable predictions that will help to improve the acceptance of computational methods in the biological community.
منابع مشابه
Estimating the Time of a Step Change in Gamma Regression Profiles Using MLE Approach
Sometimes the quality of a process or product is described by a functional relationship between a response variable and one or more explanatory variables referred to as profile. In most researches in this area the response variable is assumed to be normally distributed; however, occasionally in certain applications, the normality assumption is violated. In these cases the Generalized Linear Mod...
متن کاملEstimating Confidence Values of Individual Predictions by their Typicalness and Reliability
Although machine learning algorithms have been successfully used in many problems, and are emerging as valuable data analysis tools, their serious practical use is affected by the fact that often they cannot produce reliable and unbiased assessments of their predictions’ quality. There exist several approaches for estimating reliability or confidence for individual classifications, and many of ...
متن کاملEstimating process capability indices using ridge regression
Process capability indices show the ability of a process to produce products according to the pre-specified requirements. Since final quality characteristics of a product are usually interrelated to its previous amounts in earlier workstations, one need to model and consider the relationship among them to assess the process ca-pability properly. Hence, conducting process capability analysis in ...
متن کاملStealing Machine Learning Models via Prediction APIs
Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service (“predictive analytics”) systems are an example: Some allow users to train models on potentially sensitive data and charge others f...
متن کاملSpatial Regression in the Presence of Misaligned data
In this paper, four approaches are presented to the problem of fitting a linear regression model in the presence of spatially misaligned data. These approaches are plug-in method, simulation, regression calibration and maximum likelihood. In the first two approaches, with modeling the correlation between the explanatory variable, prediction of explanatory variable is determined at sites...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 7 شماره
صفحات -
تاریخ انتشار 2012